Waiting for a psychologist in Midtjylland

Purpose of the project

If you need a psychologist in Midtjylland, you might have to wait for many months to get one. The long waiting period can cause the mental issues of the waiting patients to worsen, which can result in the patient living with his or her issues for a longer period of time and maybe needing medical or psychiatric treatment, which could have been avoided (Ching-I Hung 2017, 7). The longer the patient is sick the more expensive it is for society to pay for this treatment, and it also costs society a lot of money in lost production (Flachs EM 2015, 163-195). The waiting time and thereby the risk of getting a more serious course of illness is not equally distributed through the municipalities, which makes the waiting times a problem of inequality.

About the Data

I use two data sets: 1) The first data set is a shape file from GADM called gadm36_DNK_2_sp.rds, which I assign to the Midtjylland variable. It is a spatial data frame containing information about the municipalities in Denmark including the municipality polygons. 2) The second data set waitingtime_regionmidt.csv contains the waiting times in the municipalities in Midtjylland measured in weeks in 2021. The data is from a report from Region Midtjylland. 3) The third data set population_over_18.csv consists of the adult population over age 18 in each municipality in Midtjylland in 2021. The data is from Danmarks Statistik. 4) The fourth data set psychology_practices_2018.csv contains data of the number of psychology practices in each municipality in Midtjylland in 2018. The data is from a report from Region Midtjylland.

Data set 2-4 is saved in the data folder.

The pre-processing of the data is described in the project report in the report folder.

Libraries

# packages needed for the script to run
library(sf)
library(raster)
library(dplyr)
library(mapview)
library(RColorBrewer)
library(tmap)

Load and prepare data

Municipality geometry data

Loading the municipalities shape file from GADM. I project the data to the EPSG number 25832 with the st_as_sf() and check that the polygons are valid with sf::st_is_valid():

# load data from GADM
municipalities <- getData("GADM", country = "DNK", level = 2)

# check data type
class(municipalities)  # data is a SpatialPolygons DataFrame
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
# project the data
municipalities_25832 <- municipalities %>%
    st_as_sf(municipalities_25832) %>%
    st_transform(municipalities_25832, crs = 25832)

# inspecting the data
municipalities_25832
Simple feature collection with 99 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
      GID_0  NAME_0   GID_1      NAME_1 NL_NAME_1     GID_2      NAME_2
36828   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.1_1 Albertslund
36664   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.2_1     Allerød
36778   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.3_1    Ballerup
37010   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.4_1    Bornholm
36900   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.5_1     Brøndby
      VARNAME_2 NL_NAME_2  TYPE_2    ENGTYPE_2 CC_2   HASC_2
36828      <NA>      <NA> Kommune Municipality <NA> DK.HS.AB
36664      <NA>      <NA> Kommune Municipality <NA> DK.HS.AL
36778      <NA>      <NA> Kommune Municipality <NA> DK.HS.BA
37010      <NA>      <NA> Kommune Municipality <NA> DK.HS.BO
36900      <NA>      <NA> Kommune Municipality <NA> DK.HS.BR
                            geometry
36828 MULTIPOLYGON (((712057 6173...
36664 MULTIPOLYGON (((700891 6191...
36778 MULTIPOLYGON (((715156 6178...
37010 MULTIPOLYGON (((878103.7 61...
36900 MULTIPOLYGON (((716929 6168...
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]
plot(municipalities_25832[7])  # plotting the municipalities geometry

class(municipalities_25832)  # sf and data frame
[1] "sf"         "data.frame"
# checking if the polygons are valid
sf::st_is_valid(municipalities_25832)  # they are all valid
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 [ reached getOption("max.print") -- omitted 24 entries ]

Then I filter the data by the NAME_1column and only keep the data from the region “Midtjylland”. I change the column name NAME_2 to the more informative name “municipalities”:

# checking NAME_1 column containing the regions
municipalities_25832$NAME_1
 [1] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
 [6] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[11] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[16] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[21] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[26] "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden" "Hovedstaden"
[31] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland"
[36] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland"
[41] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland"
[46] "Midtjylland" "Midtjylland" "Midtjylland" "Midtjylland" "Nordjylland"
[51] "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland"
[56] "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland" "Nordjylland"
[61] "Sjælland"    "Sjælland"    "Sjælland"    "Sjælland"    "Sjælland"   
[66] "Sjælland"    "Sjælland"    "Sjælland"    "Sjælland"    "Sjælland"   
[71] "Sjælland"    "Sjælland"    "Sjælland"    "Sjælland"    "Sjælland"   
 [ reached getOption("max.print") -- omitted 24 entries ]
# creating new data frame containing only data for Midtjylland
Midtjylland <- municipalities_25832[municipalities_25832$NAME_1 == "Midtjylland",
    ]

# change 'NAME_2' to 'municipality'
Midtjylland <- Midtjylland %>%
    rename(municipality = "NAME_2")

# checking the spelling of the municipality names
sort(unique(Midtjylland$municipality))
 [1] "Århus"             "Favrskov"          "Hedensted"        
 [4] "Herning"           "Holstebro"         "Horsens"          
 [7] "Ikast-Brande"      "Lemvig"            "Norddjurs"        
[10] "Odder"             "Randers"           "Ringkøbing-Skjern"
[13] "Samsø"             "Silkeborg"         "Skanderborg"      
[16] "Skive"             "Struer"            "Syddjurs"         
[19] "Viborg"           
# replacing 'Århus' with 'Aarhus' to match the other datasets
Midtjylland$municipality[Midtjylland$municipality == "Århus"] <- "Aarhus"

Waiting times data

I load the waitingtime_regionmidt.csvdata set and inspecting it. I translate the danish column names into English with rename() and fix misspelling of Samsø and Ringkjøbing-Skjern:

# load the waiting data
waitingtime <- read.csv2("../data/waitingtime_regionmidt.csv")

# Check column names
head(waitingtime)
       Kommune Gennemsnit.pr..1..Now.2021 Maksimum.pr..1..Nov.2021
1     Favrskov                         14                       26
2    Hedensted                          4                        4
3      Herning                         17                       27
4    Holstebro                         22                       30
5      Horsens                         17                       40
6 Ikast-Brande                         17                       22
  Minimum.pr..1..Nov.2021
1                       3
2                       4
3                       6
4                       8
5                       8
6                       6
# Translate column names from Danish to English
waitingtime <- waitingtime %>%
    rename(municipality = "Kommune") %>%
    rename(average_wait = "Gennemsnit.pr..1..Now.2021") %>%
    rename(max_wait = "Maksimum.pr..1..Nov.2021") %>%
    rename(min_wait = "Minimum.pr..1..Nov.2021")

# Check spelling of municipality names
sort(unique(waitingtime$municipality))
 [1] "Aarhus"               "Favrskov"             "Hedensted"           
 [4] "Herning"              "Holstebro"            "Horsens"             
 [7] "Ikast-Brande"         "Lemvig"               "Norddjurs"           
[10] "Odder"                "Randers"              "Region"              
[13] "Ringk\xbfbing-Skjern" "Sams\xbf"             "Silkeborg"           
[16] "Skanderborg"          "Skive"                "Struer"              
[19] "Syddjurs"             "Viborg"              
# fixing misspelling due to danish special characters
waitingtime$municipality[waitingtime$municipality == "Ringk\xbfbing-Skjern"] <- "Ringkøbing-Skjern"
waitingtime$municipality[waitingtime$municipality == "Sams\xbf"] <- "Samsø"

# Check data type
class(waitingtime)  # it's a data frame
[1] "data.frame"

Population data

I load the population_over_18.csvdata set without headers. I use colnames() to define column names for the data frame. Then I select the rows with the “Subtotal” values in the “age” column, and assign them to a separate variable, from which I remove the “age” column, which is not unnecessary. Lastly, I correct the spelling of Samsø and Ringkjøbing-Skjern.

# load the waiting data
population <- read.csv2("../data/population_over_18.csv", header = FALSE)

# define column names
colnames(population) <- c("municipality", "age", "population")

# take only the 'subtotal' rows (rows with total pop over 18 for each
# municipality)
total_population <- population[population$age == "Subtotal", ]

# remove 'age' column
total_population_NoAge <- total_population[, -2]

# check spelling of municipality names
sort(unique(total_population$municipality))
 [1] "Aarhus"               "Favrskov"             "Hedensted"           
 [4] "Herning"              "Holstebro"            "Horsens"             
 [7] "Ikast-Brande"         "Lemvig"               "Norddjurs"           
[10] "Odder"                "Randers"              "Region Midtjylland"  
[13] "Ringk\xf8bing-Skjern" "Sams\xf8"             "Silkeborg"           
[16] "Skanderborg"          "Skive"                "Struer"              
[19] "Subtotal"             "Syddjurs"             "Viborg"              
# fixing misspelling due to danish special characters
total_population$municipality[total_population$municipality == "Ringk\xf8bing-Skjern"] <- "Ringkøbing-Skjern"
total_population$municipality[total_population$municipality == "Sams\xf8"] <- "Samsø"

Psychology practice data

I load the psychology_practices_2018.csv data and correct the spelling of Samsø and Ringkjøbing-Skjern:

# load practices data
practices <- read.csv2("../data/psychology_practices_2018.csv")

# Check spelling of municipality names
sort(unique(practices$municipality))
 [1] "Aarhus"               "Favrskov"             "Hedensted"           
 [4] "Herning"              "Holstebro"            "Horsens"             
 [7] "Ikast-Brande"         "Lemvig"               "Norddjurs"           
[10] "Odder"                "Randers"              "Ringk\xbfbing-Skjern"
[13] "Sams\xbf"             "Silkeborg"            "Skanderborg"         
[16] "Skive"                "Struer"               "Syddjurs"            
[19] "Viborg"              
# fixing misspelling due to danish special characters
practices$municipality[practices$municipality == "Ringk\xbfbing-Skjern"] <- "Ringkøbing-Skjern"
practices$municipality[practices$municipality == "Sams\xbf"] <- "Samsø"

Merge data frames

Now,I merge the four data sets together with left_join(). The data frames all share the column “municipalities” which contains the same municipality names, and that makes it possible to match the waiting times, the population size and the number of practices correctly:

# merge the two data frames
waiting_midt <- Midtjylland %>%
    left_join(waitingtime) %>%
    left_join(total_population) %>%
    left_join(practices)

I add an extra column to the new data frame in which I calculate the number of citizens per psychology practice in each municipality:

# calculate number of persons per practice and create new column
waiting_midt$population_per_practice = waiting_midt$population/waiting_midt$practices

Save the new merged data set in the data_output folder:

setwd("..")  # changing position
readr::write_csv(waiting_midt, "data_output/waiting_midt.csv")

Mapping

Mapview: Which municipalities have the longest waiting times?

Now that the data is ready, I will investigate the waiting time in each of the 19 municipalities in Midtjylland. I found, that the best way to do this was to map the waiting times with mapview() to create a interactive map:

# plot waiting times using mapview
waiting_mapview <- mapview(waiting_midt[c(7, 14, 15, 16)], zcol = "average_wait",
    col.regions = rev(brewer.pal(10, "RdBu")))

# show map
waiting_mapview

Save Mapview map as HTML:

mapshot(waiting_mapview, "../map_output/waiting_mapview.html")

Tmap: Are there coincidence between waiting time and population size?

In this part of the project, I will examine if there is a correlation between long waiting time and a high number of people living in the municipality and vice versa. I do this by creating two maps with Tmap: one showing the distribution of waiting times and one showing the population sizes across municipalities:

# Mapping the population (over 18 years) of each municipality
population_map <- tm_shape(waiting_midt["population"]) + tm_polygons("population",
    palette = "YlGn", title = "Population (over age 18)", style = "jenks") + tm_layout(frame = FALSE,
    legend.position = c("right", "bottom"), main.title = "Fig 2: Waiting time and population size doesn't seem to correlate in all municipalities",
    main.title.position = "center", main.title.size = 1.1)

# Mapping the waiting times of each municipality
average_waiting_map <- tm_shape(waiting_midt["average_wait"]) + tm_polygons("average_wait",
    palette = "Reds", title = "Waiting time (weeks)", style = "jenks") + tm_layout(frame = FALSE,
    legend.position = c("right", "bottom"))

# Show the two maps
population_map

average_waiting_map

Save the maps on top of each other as one figure:

# Save the two maps in the same pdf
pdf("../map_output/waiting_and_population_maps")

# arrange the maps together
tmap_arrange(population_map, average_waiting_map)

dev.off()
quartz_off_screen 
                2 

tm-bubbles: Are there coincidence between waiting time and population per practice?

The purpose of the last map I’m creating is to investigate whether the municipalities with the longest waiting times are also the once with the most people per psychology practice connected to the public health insurance and vice versa. I create a waiting time map as above, but add an extra element to it. I find the centroid of each municipality with st_centroid() and in the the centroids I create a bubble using tm_bubbles(). The size of each bubble represents the number of people per psychology practice in the given municipality:

# create centroid for each municipality
centroids <- st_centroid(waiting_midt)

# use tmap to create a bubble (connected to each centroid) with a size
# corresponding to the number of practices
practices_map <- tm_shape(waiting_midt) + tm_polygons("average_wait", palette = "Reds",
    title = "Waiting time in weeks", style = "jenks") + tm_layout(frame = FALSE,
    legend.position = c("right", "bottom")) + tm_shape(centroids) + tm_bubbles(size = "population_per_practice",
    scale = 2.5, style = "jenks", col = "lightblue", alpha = 1, border.col = "blue") +
    tm_layout(frame = FALSE, legend.position = c("right", "bottom"), main.title = "Fig 3: Long waiting time doesn't seem to correlate with many persons per practice",
        main.title.position = "center", main.title.size = 1.2, legend.title.size = 1,
        legend.title.fontface = "bold")

# show map
practices_map

Save the map:

# Save bubble-map
tmap_save(practices_map, "../map_output/person_per_practice_map.png")

Final notes

I discuss the results of this project in the project report, which can be accessed through the report folder on Github.

References:

Ching-I Hung et al. 2017 ”Untreated duration predicted the severity of depression at the two-year follow-up point”, PLoS ONE 12(9).

Demarsylvain 2019 “How do I combine a dataframe with a spatial dataframe when receiving errors with both left_join and merge?”, Stackoverflow: https://stackoverflow.com/questions/56116443/how-do-i-combine-a-dataframe-with-a-spatial-dataframe-when-receiving-errors-with (visited 2023-06-04)

Flachs EM et al.  2015 “Sygdomsbyrden i Danmark - sygdomme”, Statens institut for folkesundhed, Syddansk Universitet, København: Sundhedsstyrelsen, 163-195: https://www.sst.dk/da/sygdom-og-behandling/~/media/00C6825B11BD46F9B064536C6E7DFBA0.ashx

Kabacoff, Robert I. 2017 “Merging data”, Quick-R: https://www.statmethods.net/management/merging.html

Marco Sandri 2020 “Add Color Palette to Mapview Map”, Stackoberflow: https://stackoverflow.com/questions/60099307/add-color-palette-to-mapview-map (visited 2023-05-20)

Naveen 2023 “How to rename column in R”, SparkBy{Examples}: https://sparkbyexamples.com/r-programming/rename-column-in-r/ (visited 2023-05-20)

UseR10085 2020 “How to put title of a map outside of the panel (tmap package)”, Stackoverflow: https://stackoverflow.com/questions/61355422/how-to-put-title-of-a-map-outside-of-the-panel-tmap-package (visited 2023-05-30)

Whatlf 2022 ”How do I stop r from using the first row of data as the column name?”, Stackoverflow: https://stackoverflow.com/questions/72958558/how-do-i-stop-r-from-using-the-first-row-of-data-as-the-column-name (visited 2023-06-02)